Theoretical models

This article needs to be improved, expanded, and more references need to be cited.

The term theoretical model refers to a molecular model obtained, wholly or in part, by the use of theory, such as homology modeling, energy minimization, molecular mechanics or molecular dynamics. Such theoretical models are distinguished from empirical models, which are usually obtained by X-ray crystallography or nuclear magnetic resonance (NMR).

The distinction between theoretical and empirical models is important because when theoretical models are compared with empirical models, the theoretical models often contain significant errors. In contrast, when the structure of a particular macromolecule is determined using empirical methods by different laboratories, or both by crystallography and NMR, the agreement is usually quite good.

1,390 theoretical models were historically deposited in the Protein Data Bank but removed from the main database in 2002. The structure displayed in the pages automatically generated in Proteopedia for these theoretical models should be interpreted with caution (see Category:Theoretical Model).

Empirical Models
Empirical models are not theoretical models, but are mentioned here for the sake of completeness. Empirical models, usually determined by X-ray crystallography or nuclear magnetic resonance, are the most reliable and accurate models available. Methods for judging the reliability and quality of empirical models are discussed at Quality assessment for molecular models. Independent determinations of the same protein by empirical methods generally agree within about 0.5 Å RMS for carbon alphas (reference needed).

Homology Models
Homology models, also called comparative models, are obtained by folding a target protein sequence to fit an empirically-determined template model. The registration between residues in the target and template is determined by an amino acid sequence alignment between the target and template sequences. Errors or uncertainties in the sequence alignment result in errors or uncertainties in the homology model. Provided there is sufficient sequence identity between the target and template, the main chain in homology models is usually mostly correct. However, the positions of sidechains in homology models are usually incorrect.

Empirically-determined templates with adequate sequence identity are available for less than half of all protein sequences. One of the major goals of structural genomics is to increase the sequence diversity of the available empirically-determined structures that can be used as templates for homology modeling.

A number of free servers have libraries of homology models generated in advance for protein sequences, and many will create homology models for a submitted protein sequence. For more, please see Homology modeling servers. When no suitable template exists, the Structural Genomics Target Database should be searched with your sequence. In some cases, a sequence-similar protein has already been crystallized and diffracted, but the model may not have been completed, or the completed model may not yet have been deposited in the PDB. In such cases, it may be worthwhile to contact the team that has made the most progress on a closely related sequence.

See also
 * Homology modeling
 * The list of resources at User:Wayne Decatur/Homology Modeling.

Examples

 * Structure of E. coli DnaC helicase loader concerns a homology model.

Ab Initio Models
When there is no template with sufficient sequence identity to use for homology modeling, one can use ab initio or de novo folding theory to predict the structure of a target protein sequence. Such theory is about 70% successful at predicting secondary structure. Tertiary structure prediction has modest success for small protein chains (80 amino acids or less), but is generally unable to predict the fold for longer chains. In about one out of four cases of small domains of less than 85 amino acids, the best predictions are within about 1.5 &Aring; (RMS for carbon alphas) of the true structure. (Independent determinations of the same protein by empirical methods generally agree within about 0.5 &Aring; RMS for carbon alphas.)

The success of fold prediction methods is assessed biannually in the Critical Assessment of Techniques for Protein Structure Prediction (CASP) competitions. Crystallographers submit sequences which they have solved, but for which the structures have not yet been published. Modelers predict the folds which are then compared with subsequently published structures. Beginning in CASP5 (2002), the ability to predict intrinsic disorder was included. There are also competitions to predict protein-protein docking interactions

Assessment of CASP results is done in a double-blind manner: the predictors do not know the empirical structures, and the assessors do not know the identities of the predictors, which are coded. In CASP8 (2008), there were 13 "template free" targets, that is, sequences for which no significant sequence identity occurred for any empirically solved entry in the PDB. These are the most difficult to predict, as they must be predicted by ab initio methods. 102 groups submitted predictions. Assessing the quality of a prediction is not simple, given that even "good" predictions can have high root mean square (RMS) deviations for alpha carbon alignment, e.g. due to a hinge. Several assessment methods were used, each emphasizing different qualities. A number of groups submitted good predictions for six of the thirteen targets. None of the submitted models was judged to be satisfactory for four of the thirteen templates.